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METHOD AND APPARATUS FOR PERFORMING MODULAR ARITHMETIC 

BACKGROUND OF THE INVENTION 

Field of the Invention 

Many electronic interactions require the provision of a certain level of 
5 security to ensure that the data contained in a message transfer is difficult to 
intercept and decode, or it is capable of being verified as being genuine, or both. 
To achieve these ends, it is possible to encrypt data according to one of many 
possible schemes. A popular scheme is called public key cryptography (e.g., 
PGP). Public key cryptography enables a particular message to be encoded 
10 according to an individual's private key and a third party's public key - both are 
long fixed numbers. The message may then be decoded by the third party through 
use of their private key. In this way, each party may keep their private key secret 
and thus control who is able to receive and decode any given message. 

One of the key elements of encryption systems is the ability to be 
1 5 able to perform modular arithmetic. The basic calculation which is performed may 
be written as : 

S =AB mod N (1) 

where A, B and N are large numbers, typically including many hundreds of digits. 

Cryptography systems are generally mathematically complex and 
20 can pose a high computational overhead on any system which implements them. 

Description of the Related Art 

Prior art systems for performing modular arithmetic make use of 
Montgomery's theorem, which has been used in many software and hardware 
implementations of modular arithmetic algorithms. Implementations using 
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Montgomery's theorem are able to compute a value for S without first multiplying A 
and B and then dividing by A/. Most of the hardware implementations rely on an 
iterative approach which decomposes A into k blocks of p bits to limit the size of 
the hardware operators required. Further advances have used a serial 
5 architecture to further reduce the circuit size. Such architectures are generally 
based around two serial multipliers, FIFO elements, and the pre-computation of a 
constant J 0i such that : 

J 0 .Afe-1mod2 p (2) 

k and p are both positive integers, and the binary representation of a positive 
1 0 integer X, where X < 2 kp may be given by: 

i=0 

where 0 < X[/] < 2, i.e., X may be either 0 or 1 . 

Throughout this specification, square brackets [ ] refer to a particular 
bit position in a multi-bit word e.g., X[/] refers to the i th bit of word X. Angle 
1 5 brackets < > refer to a particular block of a multi-bit word e.g., X</> refers to the i th 
block of word X. Parentheses ( ) refer to the value of a word at a particular 
iteration of a loop function e.g., X(i) refers to the value of word X at the i th iteration. 

A definition for X\j:k], where j > /c, is that X is a positive integer having 
a total length of y+1-/c bits, such that X\j\ is the MSB and X{k] is the LSB. 
20 The base 2 P representation of X is given by: 

X^Xii)!* (4) 

<=0 

where 0 < X<i> < 2 P 
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In the following description, it is assumed that N is an odd integer 
such that 2 P< *" 1) < N < 2 kp , and that both A and B are less than N. A p-bit constant, 
Jo, is thus defined as: 



J 0 .A/<0>s-1.mod2 p (5) 

5 N is the modulus number which is used in all public key cryptography 

systems. It is defined as the product of two large prime numbers (i.e., »2) and 
must therefore be odd. 

The prior art hardware implementation of the Montgomery theorem 
may be described by the following pseudo-code. 

10 

1. procedure MM-BASIC(ABJV) 

2. S(-1) = 0 

3. for /'=0 to /c-1 

4. T = S(i - 1 ) + A<i>B 
15 5. Y 0 = (7. Jo) mod 2 P 

6. S(i) = (T+ NY 0 )/2 P 

7. if S(i) > N then S(i) = S(i) - N 

8. end for 

20 The implementation of this pseudo code in hardware is shown in a 

simplified form in Figure 1 . The architecture is constructed in serial form so that 
one bit of the solution is generated for each clock cycle. Such an architecture, as 
opposed to a parallel one, minimizes the amount of hardware required at the 
expense of speed. 

25 The circuit of figure 1 is arranged to receive five different input 

signals: A[k] 200; B[f\ 205; S(/-1) 210; GE(/-1) 215; and N[f\ 220. 



3 



Serial Multiplier 110 accepts as inputs, a fixed p-bit word, A<i> 
produced by register 105, and a one-bit data stream B[f\ 205. It then acts to 
produce the output, (>4</>.B), one bit at a time. 

Multiplier 1 10 is configured internally as shown in Figure 2. The two 
5 inputs are the output 340 of register 105 and B[f\ 205. The two inputs 205, 340 are 
ANDed together in AND gate 300. The result of this operation is fed into Carry 
Save Adder 310, along with two other inputs. The first of these other inputs is the 
carry output (C) derived from the fed back output from p-bit register 31 5. The 
other input to the Adder is derived from the result output (R) of p-bit register 320 
10 which has been divided by 2 in divider 305. Registers 315, 320 are positioned 
immediately after the Carry Save Adder 310 and each receives one of the twin 
outputs produced by the adder. 

The Carry Save Adder 310 is arranged to transform a sum of three 
numbers into a sum of two numbers such that: 

15 2.C + R = X + Y + Z (6) 

The Carry Save Adder 310 computes C{t) and R(t) based on the 
following bitwise Boolean equations. 

C(t) = (C(t - 1 ) OR R(t - 1 )/2) AND (C{t - 1 ) 
AND B[f\.A<i>) AND (R(t - 1 )/2 AND b[T\A<i>) (7) 

20 R(t) = C(t - 1) © R[f\A<i> (8) 

In a simplified notation: 

0(f), R(t) = SERIALJVIULT {B[f\A<i> t C(M ), R(M )) (9) 
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The procedure MM_BASIC, already shown, may be written in a form 
which shows the serial operations explicitly: 

1. procedure MM-SERIAL(A B, N) 

5 2. S(-1) = 0 

3. GE(-1) = 0 

4. for /'=0 to /c-1 



5. Computation of Y 0 

6. for t= 0 to p-1 

10 7. C S i(f). Ksi(0 = SERIAL_SUB(C S i(M), GE(/-1) . A/tf, S(/-1)[f]) 

8. C M1 (f), = SERIAL_MULT(B[f] . AO, C M1 (M), R M i(M)) 

9. C„i(0. R*i(0 = SERIAL_ADD( C/u(M ), R M :(t)[0], R s ,(t)) 

10. C M2 (t), R M2 (t) = SERIAL_MULT(R^(f) . J 0 , C M2 (M), Krf-1)) 

11. Vo[fl = ^ 2 (0 
15 12. end for 

13. # mail loop: computation of S(i) 

14. for t = 0 to kp + p-1 

15. Csi(f), Rsi(0 = SERIAL_SUB(C S i(M), GE(/-1) . N[t], S(/-1)[f]) 

16. C M1 (0, f?/wi(0 = SERIAL_MULT(B[f] . /4</>, C M1 (M), R M i(M)) 
20 17. C^i(t), R„i(f) = SERIAL_ADD( C/u(M ), F? M i(f)[0], R S i(0) 

18. C M2 (t), R M2 {t) = SERIAL_MULT(R^(0 . J 0 , C W2 (M), K M2 (M)) 

19. C A2 (0, RA2(t) = SERIAL_ADD(C A i(M ), R M2 (f)[0], R A «)) 

20. S(/)[f-p] = R A2 (0 

21. SGE(t) = SERIAL_GE(SG£(M), A/[f-p], S(/)[f-p]) 
25 22. end for 

23. GE(/) = SGE(/cp+p-1) 



24. end for 

The total number of clock cycles required to compute the result 
30 according to the above scheme is k(kp+2p). 
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BRIEF SUMMARY OF THE INVENTION 

In accordance with one embodiment of the present invention, an 
apparatus is provided that includes inputs A, B and N, and an output S, said 
apparatus being arranged to perform a modular operation, S=A.B mod N, the 
5 apparatus including a 2-stage Carry Save Adder (2-CSA) and a 1 -stage Carry 
Save Adder (1-CSA), the 2-CSA being arranged to receive 5 input signals: U 0 , 
being the partial product of N and Y 0 ; Ui, being the subtraction of a previous 
version of S and 1)6 wherein 1)6 is either N or 0 depending on the value of the 
comparison between the result of the previous iteration and N; U 2 , being the partial 

10 product of B with the current version of A; lb, being S/2; and U 4 , being the carry 
output of the 1 -CSA; where result and carry outputs of the 2-CSA form two of three 
inputs to the 1-CSA, wherein the result (R) output of the 1-CSA is the desired 
result (S), and the third input to the 1-CSA is a compensation signal arranged to 
allow S to be calculated without knowing the constant J 0 , where JoN<0> = -1 . mod 

1 5 2 P , where p is a block length into which A is sub-divided. 

In a second broad form, an embodiment of the present invention 
provides An iterative method of performing a modular operation of S = A.B mod N, 
where A, B and N are encoded as multi-bit digital words, including the following 
steps: a) setting S(-1) to 0, and i to 0; b) setting S(i) to (S(i-1) + A<i>B + NY 0 )/2 P ; 

20 c) setting S(i) to (S(i) - N) if S(i) > N; d) repeating steps b) and c) k times, wherein: 
i is a loop counter; k is a number of blocks of p bits length into which A is divided; 
Yo = ((T.Jo) mod 2 P ); J 0 N = -1mod2 p ; and Y 0 is calculated one bit at a time, based 
on the fact that (T + NY 0 ) is a multiple of 2 P . 

In accordance with another embodiment of the invention, an 

25 apparatus for performing modular arithmetic is provided, the apparatus includes a 
first AND gate configured to receive first and second inputs and to generate a first 
output; a second AND gate configured to receive third and fourth inputs and to 
generate a second output; a divider configured to generate a third output; a first 
carry-save adder configured to receive as inputs the first output from the first AND 
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gate, the second output from the second AND gate, and the third output from the 
divider and to generate fourth and fifth outputs; a second carry-save adder 
configured to receive the combination of the fourth output and a fifth input as one 
input and to receive the fifth output as a second input and to generate a carry 
5 output that is fed back into a third input of the second carry-save adder and to 
generate a result output that is an input to the divider and the desired result. 

In accordance with yet a further embodiment of the invention, an 
apparatus for performing modular arithmetic is provided, the apparatus having 
inputs A, B, and N and an output S; a first carry-save adder configured to receive 

10 five input signals that include: U 0 , the partial product of N and Y 0j where Y 0 equal 
((TJ 0 ) mod 2 P ); Ui, the subtraction of a previous version of S and U 6 , wherein U 6 is 
one of N or 0 depending on the value of a comparison between a result of a 
previous iteration and N; U2, a partial product of input B and a current version of 
input A; U 3 , the result of S/2; and U 4 , a carry output of a second carry-save adder; 

1 5 the first carry-save adder configured to generate a result output and a carry output; 
the second carry-save adder configured to receive the result output and the carry 
output from the first carry-save adder and to receive a compensation signal as a 
third input and to generate a desired result and the carry output U 4 ; and wherein 
J 0 N<0> = -1 1, mod 2 P , where p is a block length into which A is sub-divided. 

20 Other features and benefits of the invention will become apparent in 

the following description of various embodiments of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

For a better understanding of the present invention and to 
understand how the same may be brought into effect, the invention will now be 
25 described by way of example only, with reference to the appended drawings in 
which: 

Figure 1 shows a simplified prior art circuit for implementing modular 
arithmetic according to Montgomery's theorem; 
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Figure 2 shows a prior art serial/parallel multiplier or carry save 

adder; 

Figure 3 shows a merged multiplier as used in embodiments of the 
invention; and 

5 Figure 4 shows a hardware implementation according to an 

embodiment of the invention. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention retains a serial architecture to accomplish the 
calculation, but embodiments of the inventions do not require pre-knowledge of the 
10 constant, J 0 . Embodiments of the invention calculate Y 0 = ((T . J 0 )mod 2 P ) one bit 
at a time, based on the fact that (T + NY 0 ) must be a multiple of 2 P . In this way, the 
complex mathematical functions required to pre-compute Jo can be dispensed 
with. 

With this implicit knowledge, the procedure MM-BASIC described 
1 5 previously, may now be written as MM-SIMPLE: 

1 . procedure MM-SIMPLE(A 6, N) 

2. S(-1) = 0 

3. for / = 0 to /c-1 

20 4. S(i) = (S(/-1 ) + A<i>B + NY 0 )I2 P 

5. if S(i) > N then S(i) = S(i) - N 

6. end for 

The above serial implementation of MM-SIMPLE is more efficient 
25 than the prior art implementation of MM-BASIC that has two multipliers required 
instead of the single multiplier in the disclosed embodiments of the invention. The 
gain, in terms of fewer components, is a total of 2p registers plus the two serial 
adders 120 and 155. The removal of the need for these components removes a 
significant amount of circuitry, and thus the resulting architecture requires less 
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space and consumes less power to achieve the same result This design also 
calculates the result in fewer clock cycles. 

Figure 3 shows the resultant hardware implementation which may be 
used to perform the steps of procedure MM-SIMPLE presented above. As shown 
5 therein, an apparatus for performing modular arithmetic is provided that includes a 
first AND gate 400 receiving inputs 470 and 475 and generating a first output, a 
second AND gate 410 receiving as inputs 480 and 485 and generating a second 
output, and a divider 420 generating a cleared output. A first carry-save adder 430 
receives on a first input the first output of the first AND gate 400, on a second input 

10 the second output of the second AND gate 410, and on a third input an output of 
the divider 420, and generates therefrom a first output and a second output that 
are received as first and second inputs of a second carry-save adder 440. It is to 
be noted that the first output of the first carry-save adder 430 is combined with a 
fifth input 465 prior to being received at the second carry-save adder 440. The 

15 second carry-save adder 440 generates on a first output a carry out that is 
received at a first register 450, the output of which becomes a third input to the 
second carry-save adder 440. A second output of the carry-save adder 440 is 
received at a second register 460 and becomes the desired result 490 that is also 
the input to the divider 420. 

20 Yq is computed bit by bit during the first p cycles of the loop, starting 

at line 15 of the procedure MM-SERIAL. Assuming that at cycle g<p, the bits 0, 
1 ,...q-1 have already been computed, leaving only bit q to be discovered. 

According to embodiments of the present invention, if, at cycle q, the 
LSB of the 2-stage Carry Save Adder shown in Figure 3 is T, then N[q:0] is added 

25 to the intermediate result, and Y 0 [q] = 1 . 

This may be proved as follows. At the q th step, the intermediate 
values from the first Carry Save Adder may be given as 

S = 2C + R (10) 



=(>A</>.S[g:0]+y 0 [t7-1 :0].A/[q:0]+S</-1 >[q:0])/2 q (1 1 ) 



Assuming that the q m bit of Y 0 is a T, then the above equation may 
be re-written as: 

S'=(A<i>.B[q:0] + (2 q + Yofa-1 :0]).A/[g:0] + S</-1 >[q:0])/2 Q (12) 

5 =S + N[q:0] (13) 

As the LSB of N is always 1 , since it is a large prime number and, 
therefore, odd, then from the above equations, it can be seen that the LSBs of S 
and S' are always inverted. Therefore, it is possible to guarantee that the LSB of 
the result is 0 in the first p steps by choosing either S or S'. The choice of S' 
1 0 implies that the q th bit of Yo must be forced to equal 1 . 

The above step is repeated at each cycle q<p, so that at the end all 
bits of Yo are discovered. 

The procedure, MM-SERIAL-SIMPLE shown below is a pseudo-code 
implementation of an embodiment of the present invention, and is a version of the 
1 5 previously presented MM-SERIAL adapted according to the above results. 



1. procedure MM-SERIAL-SIMPLE(A, B, N) 

2. S(-1) = 0 

3. GE(-1) = 0 

20 4. for / = 0 to /f-1 

5. # main loop: computation of S(i) 

6. for t = 0 to /cp+p-1 

7. C s1 (f), Rs-i(f) = SERIAL_SUB(C S1 (M), GE(/-1) . N[f\, S(/-1)[f]) 

8. dnt, Rmt = 2-STAGE_CSA(e[f] . AO, C M (M ), R M (M )/2, 
25 N[f\ . Yo) 

9. if t < p and R int [0] = 1 then 
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1 0. C M (t), Rdt) = CSA(A/[f:0], C int , Rint) 

11. V 0 [f] = 1 

12. else 

13. Cm(0. ^m(0 = Cint, Rint 

5 14. end if 

15. S(i)[t-P) = «m(0[0] 

16. SGE(f) = SERIAL_GE(SGE(M ), A/[f-p], S(/)[f-p]) 

17. end for 

18. GE(i) = SGE(kp+p-l) 



10 19. end for 

The conditional statement at line 9 of the above procedure may be 
considered to trigger a compensation event which, if t<p and Rint[0] =1 , causes the 
value of register 525 N de i to be applied to the input of the 1 -stage CSA (1-CSA) 
1 5 540. If the condition is not satisfied, then the C and R outputs of the 2-stage CSA 
(2-CSA) 520 merely feed straight into the 1-CSA and no compensation is 
performed. 

It is the addition of the compensation function that directly removes 
the need to explicitly compute Jo- 
20 In figure 4, the compensation function is implemented by register 

525, AND gate 530, MUX 535. The MUX 535 effectively performs the conditional 
IF statement of line 9 of MM-SERIAL-SIMPLE, and if R in t[0] is equal to 1, then the 
contents of register 525 is applied to 1-CSA 540. 

The above procedure (MM-SERIAL-SIMPLE) is further explained in 
25 the procedure below (MM-SERIAL-SIMPLE_enhanced), which includes further 
details on selected ones of the internal signal nets. 

These internal nets are labelled from Uo to Us and directly 
correspond with selected internal nets shown in Figure 4. 

30 1. procedure MM-SERIAL-SIMPLE_enhanced(A, 8, N) 
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2. 


S(-1) = 0 






3. 


GE(-1) = 


0 




4. 


Anext = A[p-1 :0] 




5. 


for /'=0 to fr-1 


5 


6. 


# main loop: computation of S(i) 




7. 


Ndel 


= 0 




8. 


Y 0 = 


= 0 




9. 


R = 


0 




10. 


C = 


0 


10 


11. 


A current ~~ A nex t 




12. 


A ne;( , = A[(/+1)(p-1):(/+1 )p] 




13. 


for t = 0 to /cp+p-1 




14. 




C/ 0 = AND2(A/[f], Yo) 




15. 




C/ 6 = AND1(GE(/-1), A/[f]) 


15 


16. 




L/i = SUB1(U 6 , S(/-1[f]) 




17. 




U 2 = AND3(B[f], A curren ,) 




18. 




U 3 = RI2 




19. 




u 4 = c 




20. 




C int) R int = 2-STAGE-CSA(ty 0 , U u 


20 


21. 




U 7 = MUX(R (n ,[0],0) 




22. 




U 5 = AND4(U 7 , N del ) 




23. 




if f < p then 




24. 




Void = (A 




25. 




/WO = Ml 


25 


26. 




L/8=0 




27. 




else 




28. 




# A/de/ acts as a shift register 




29. 




U 8 = N del [0] 




30. 




Ndel = N d ei/2 


30 


31. 




Ndelip-1] = N[t] 




32. 




endif 
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33. C,R = CSA(L/ 5 , Qnt.Rini) 

34. S(i)[f\ = R[0] 

35. SGE(t) = GE(U a , R[0]) 

36. end for 

5 37. GE(i) = SGE(kp+p--\) 



38. end for 

As an example, presented below are details of how an embodiment 
of the invention operates on some sample input data. The following inputs are 
10 provided in 32-bit format: 

A = C7197F0E 

B = CCEFe4E4_774F9EE5_848D84£6 
N = D077EC53_F4A427/H_D7816723 

1 5 The result of the Montgomery multiplication of A by B is given by 

(AB+NY 0 )/2 P . Before the computation starts, the registers of the multiplier are 
initialized as follows. 

N 0 = 00000003 
20 Y 0 = 00000000 

RC = 0_00000000 
B[f] = 6 
A/[f] = 3 

25 For the sake of simplicity, the registers R and C have been summed 

into register RC, and the computation is performed 4 bits (a nibble) at the time, 
thus setting p=4. 



1 . Computation of the intermediate results, based on the partial 
products 



A/[f]Y0 = 0_00000000 

+B[f].A = 4_A498M54 

+KC/16 =0_00000000 

= Intermediate = 4_AA98FA54 

2. Find the first 4 bits of compensation value (Z) such that the 4 LSBs of 
Intermediate+Z.No are all zero. 

Z =4 

3. Add the partial product Z.N 0 to Intermediate 

Intermediate = 4_AA98FA54 

+Z.N 0 = 0_0000000C 

=RC =4_AA98FA60 

4. Update the registers with- the new values and restart the cycle 
00000023 Y 0 = 00000004 RC = 4_AA98FA60 B[f\ = E N[f\=2 
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1 . Computation of the intermediate results, based on the partial 
products 



N[f\Y0 = 0_0000008C 

+B[f\.A = A.E364F2C4 

+RC/16 = 0_4AA98FA6 

intermediate = e_2E0E8272 

2. Find first 4 bits of compensation (Z) such that the 4 Isb of 
Intermediate+Z.No are all zero. 

2 =A 

3. Add the partial product Z.N 0 to Intermediate 

Intermediate = B_2E0E8272 

+Z.N 0 =0_0000015E 
=RC = E_2E0E83D0 

4. Update the registers with the new values and restart the cycle 
00000723 Y 0 = 000000/44 RC = B_2E0E83D0 B[t\=A N[f\ = 7 
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1 . Computation 
products 



of the 



intermediate 



results, based on the partial 



N[t\.Y0 =0_0000047C 

+B[t\.A = 7_C6FEF68C 

+RC/16 =0_B2E0E83D 

intermediate = 8_79DFE345 

Find first 4 bits of compensation (Z) such that the 4 Isb of 
Intermediate+Z.NO are all zero. 



=9 



3. Add the partial product Z.A/ 0 to Intermediate 

Intermediate = 8_79DFE345 

+Z.N 0 = 0_0000403B 

=SUM 2 = 8_79E02380 

4. Update the registers with the new values and restart the cycle 

No = 00006723 Y 0 = 000009A4.RC = 8_79E02380 B[f\ = 8 N[f\ = 6 

This process is repeated until all the bits of Y 0 are discovered. At this 
10 stage, the compensation phase is no longer needed so the computation iterates 
over the remaining bits of B and N. The step by step result at each phase is given 
by the following table: 
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Cycle 


No 


Yo 


RC 


B[f\ 


N[(\ 


0 


XXXXXXXX 


XXXXXXXX 


xxxxxxxxxx 


X 


X 


1 


00000003 


00000000 


0000000000 


6 


3 


2 


00000023 


00000004 


04AA98FA60 


E 


2 


3 


00000723 


000000A4 


0B2E0E83D0 


A 


7 


4 


00006723 


000009A4 


0879E02380 


8 


6 


5 


00016723 


000009A4 


06C06A3480 


D 


1 


6 


00816723 


000A09A4 


0A88602800 


8 


8 


7 


07816723 


000A09A4 


06E1A24810 


4 


7 


8 


D78 16723 


090A09A4 


03CE530470 


8 


D 


9 


D78 16723 


790A09A4 


0CCFBD78O0 


5 


4 


10 


4D781672 


790A09A4 


0694A37956 


E 


A 


11 


A4D78167 


790A09A4 


1007138AC1 


E 


7 


12 


7A4D7816 


790A09A4 


0F331C6EEC 


9 


2 


13 


27A4D781 


790A09A4 


08E52B51B4 


F 


A 


14 


A27A4D78 


790A09A4 


10F3358755 


A 


A 


15 


AA27A4D7 


790A09A4 


0D9096AF69 


7 


4 


16 


4AA27A4D 


790A09A4 


082EE40AE8 


7 


F 


17 


F4AA27A4 


790A09A4 


0D0C374AAC 


4 


3 


18 


3F4AA27A 


790A09A4 


0558478DCE 


E 


5 


19 


53F4AA27 


790A09A4 


0D961B9BD4 


A 


C 


20 


C53F4AA2 


790A09A4 


0E4CD923F9 


B 


E 


21 


EC53F4AA 


790A09A4 


1011728ED1 


F 


7 


22 


7EC53F4A 


790A09A4 


0FFADBDE3B 


E 


7 


23 


77EC53F4 


790A09A4 


0F3258F423 


C 


0 


24 


077EC53F 


790A09A4 


0A485783EA 


c 


D 


25 


D077EC53 


790A09A4 


101F39EA3A 


0 


0 


26 


0D077EC5 


790A09A4 


0101F39EA3 


0 


0 


27 


00D077EC 


790A09A4 


00101F39EA 


0 


0 



Cycle 


No 


Yq 


RC 


Bid 


N[f\ 


28 


000D077E 


790A09A4 


000101F39E 


0 


0 


29 


0000D077 


790A09A4 


00001 01 F39 


0 


0 


30 


00000D07 


790A09A4 


000001 01 F3 


0 


0 


31 


000000D0 


790A09A4 


0000001 01 F 


0 


0 


32 


0000000D 


790A09A4 


0000000101 


0 


0 


33 


00000000 


790A09A4 


0000000010 


0 


0 


34 


00000000 


790A09A4 


0000000001 


0 


0 



Notice that the serial output result can be read directly as the right 
most nibble of the RC column. It is also interesting to notice the shifting pattern of 
N 0 . From cycle 1 to 8, the register behavior is comparable to a stack, where the 
5 nibble are pushed from the left. From cycle 9 onward, the register behaves as a 
right shift register. The output of this register shall be used as the input of a 
comparator which detects if the results is greater or equal to A/. 

/= 79040944 

RESULT^ 1_01 F39E43_A438194E_C8954C16_00000000 

10 In the light of the foregoing description, it will be clear to the skilled 

man that various modifications may be made within the scope of the invention. 

The present invention includes any novel feature or combination of 
features disclosed herein either explicitly or any generalization thereof irrespective 
of whether or not it relates to the claimed invention or mitigates any or all of the 

1 5 problems addressed. 

All of the above U.S. patents, U.S. patent application publications, 
U.S. patent applications, foreign patents, foreign patent applications and non- 
patent publications referred to in this specification and/or listed in the Application 
Data Sheet, are incorporated herein by reference, in their entirety. 
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From the foregoing it will be appreciated that, although specific 
embodiments of the invention have been described herein for purposes of 
illustration, various modifications may be made without deviating from the spirit 
and scope of the invention. Accordingly, the invention is not limited except as by 
5 the appended claims. 
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